Fix controller hang when submission exits early#1695
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1695 +/- ##
==========================================
- Coverage 53.90% 53.90% -0.01%
==========================================
Files 341 341
Lines 27983 27985 +2
==========================================
+ Hits 15084 15085 +1
- Misses 12899 12900 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Maybe I am missing something, but not closing the pipe can potentially make the problem much worse. If the solution sends multiple queries and then it exits, the controller will try to print answers for all the queries, filling up the pipe buffer and blocking indefinitely. I would argue that it is much better to ignore both |
|
That issue can occur regardless. A solution which sends a large number of queries without reading responses will cause the checker->solution pipe to fill up, which causes the checker to stop processing queries, and then after a while the solution->checker pipe fills up too and we get a deadlock. In Kattis, this has been mitigated somewhat by increasing pipe buffer size to the Linux maximum of 1 MiB, and it is eventually planned to be fixed for real by introducing a proxy process (see Kattis/problemtools#113; additional upsides of that solution is being able to log the interaction and show it for sample testcases, and being able to impose communication traffic limits to eliminate judge errors from resource exhaustion caused by unbounded-size I suppose a small difference compared to that scenario is that with the pipe left open the checker could be the one left to take the blame, if the solution exits early and the judging system treats checker walltime limit timeouts as judge errors. Anyway, this not being a hill I want to die on, I just pushed a commit that closes the pipe. |
|
i can think of a way to avoid the deadlock, though it requires a bit of extra complexity on the manager side.
while this is somewhat complicated, i think it can be abstracted away into a header-only library a la testlib.h. i think this should have minimal performance impact if the solution is well-behaved and reads its end of the pipe normally (in which case the write() should rarely fail at all, at least when assuming that the pipe buffer is large enough for one query's response). in any case, it avoids the overhead of another proxy process and all the associated context switches. |
Having the
u_to_c_wfd inherit to the controller means that when the submission exits, the controller will never read EOF, because it still itself has the write end of the pipe open. Easy fix is just to make that not inherit: nothing actually needs the inheritance, in fact it's rather a bit of a security issue that each spawned submission get access to extra fd's.Small additional related fixes:
c_to_u_rnot inherit means that now if the submission exits, controller writing will cause SIGPIPE unless the controller ignores that signal. We don't particularly want that to happen (it makes it impossible to detect which process exited first, which we really ought to do to give proper verdicts); better to leave the pipe unclosed.